Goto

Collaborating Authors

 nonconvex loss


Black-Box Generalization: Stability of Zeroth-Order Learning

Neural Information Processing Systems

We provide the first generalization error analysis for black-box learning through derivative-free optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a $d$-dimensional model by replacing stochastic gradient directions with stochastic differences of $K+1$ perturbed loss evaluations per dataset (example) query. For both unbounded and bounded possibly nonconvex losses, we present the first generalization bounds for the ZoSS algorithm. These bounds coincide with those for SGD, and they are independent of $d$, $K$ and the batch size $m$, under appropriate choices of a slightly decreased learning rate. For bounded nonconvex losses and a batch size $m=1$, we additionally show that both generalization error and learning rate are independent of $d$ and $K$, and remain essentially the same as for the SGD, even for two function evaluations. Our results extensively extend and consistently recover established results for SGD in prior work, on both generalization bounds and corresponding learning rates. If additionally $m=n$, where $n$ is the dataset size, we recover generalization guarantees for full-batch GD as well.


Scalable Robust Matrix Factorization with Nonconvex Loss

Neural Information Processing Systems

Robust matrix factorization (RMF), which uses the $\ell_1$-loss, often outperforms standard matrix factorization using the $\ell_2$-loss, particularly when outliers are present. The state-of-the-art RMF solver is the RMF-MM algorithm, which, however, cannot utilize data sparsity. Moreover, sometimes even the (convex) $\ell_1$-loss is not robust enough. In this paper, we propose the use of nonconvex loss to enhance robustness. To address the resultant difficult optimization problem, we use majorization-minimization (MM) optimization and propose a new MM surrogate. To improve scalability, we exploit data sparsity and optimize the surrogate via its dual with the accelerated proximal gradient algorithm. The resultant algorithm has low time and space complexities and is guaranteed to converge to a critical point. Extensive experiments demonstrate its superiority over the state-of-the-art in terms of both accuracy and scalability.


Stochastic Optimization Schemes for Performative Prediction with Nonconvex Loss

Neural Information Processing Systems

This paper studies a risk minimization problem with decision dependent data distribution. The problem pertains to the performative prediction setting in which a trained model can affect the outcome estimated by the model. Such dependency creates a feedback loop that influences the stability of optimization algorithms such as stochastic gradient descent (SGD). We present the first study on performative prediction with smooth but possibly non-convex loss. We analyze a greedy deployment scheme with SGD (SGD-GD).


A Flexible Fairness Framework with Surrogate Loss Reweighting for Addressing Sociodemographic Disparities

Xu, Wen, Dolatabadi, Elham

arXiv.org Artificial Intelligence

This paper presents a new algorithmic fairness framework called $\boldsymbol{\alpha}$-$\boldsymbol{\beta}$ Fair Machine Learning ($\boldsymbol{\alpha}$-$\boldsymbol{\beta}$ FML), designed to optimize fairness levels across sociodemographic attributes. Our framework employs a new family of surrogate loss functions, paired with loss reweighting techniques, allowing precise control over fairness-accuracy trade-offs through tunable hyperparameters $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$. To efficiently solve the learning objective, we propose Parallel Stochastic Gradient Descent with Surrogate Loss (P-SGD-S) and establish convergence guarantees for both convex and nonconvex loss functions. Experimental results demonstrate that our framework improves overall accuracy while reducing fairness violations, offering a smooth trade-off between standard empirical risk minimization and strict minimax fairness. Results across multiple datasets confirm its adaptability, ensuring fairness improvements without excessive performance degradation.


Black-Box Generalization: Stability of Zeroth-Order Learning

Neural Information Processing Systems

We provide the first generalization error analysis for black-box learning through derivative-free optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a d -dimensional model by replacing stochastic gradient directions with stochastic differences of K 1 perturbed loss evaluations per dataset (example) query. For both unbounded and bounded possibly nonconvex losses, we present the first generalization bounds for the ZoSS algorithm. These bounds coincide with those for SGD, and they are independent of d, K and the batch size m, under appropriate choices of a slightly decreased learning rate. For bounded nonconvex losses and a batch size m 1, we additionally show that both generalization error and learning rate are independent of d and K, and remain essentially the same as for the SGD, even for two function evaluations.


Reviews: Scalable Robust Matrix Factorization with Nonconvex Loss

Neural Information Processing Systems

In this paper, an algorithm for matrix factorization (MF) is proposed, which provide better robustness to outlier's behavior compared to state of the art algorithms for Robust MF (RMF). Another nice property of the new algorithm is that it is suitable for sparse data matrices which is a clear advantage when it is compared against previous approaches to RMF. Basically, the authors propose to replace the l1 penalty with a nonconvex regularizer and provide results for various selections of regularizers. Since the optimization in this case is not simple, the authors propose to use a Majorization Minimization (MM) optimization approach with a definition of a new MM surrogate. The paper provides a nice introduction to the problem of robust matrix factorization, providing a complete overview of the state of the art and giving good motivations for the development of a new algorithm. I found the paper well written, technically sounded and having a nice structure.